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We propose the design of- the Enhanced 3D~RS motion estimation algorithm- for H.263 
video coding, which leads to significant improvement of the coding efficiency with respect 
to the conventional 3 D-RS algorithm, while keeping low the increase of the com.putational 
effort. In case of typical videoconferencing sequences the Enhanced 3D-RS algorithm 
performs well when compared to full-search motion estimution: furthermore, the recursive 
strategy of the proposed algorithm improves the noise robustness of the estimated motion 
field, in that its compression gain is comparable to full-search motion estimation in case of 
noisy sequences. The Enhanced 3D-RS motion estimation algori.thm has been successfidly 
integrated with the H.263 video codec for Philips Trimedia processor (TMIOOO); real-time 
experiments have proved very good perceptual quality of the coded pictures. 



1 Introduction 

The H.263 standard for low bitrate video-coufereucing [l]-[2] is based on a video com- 
pression procedure which exploits the high degree of spatial and temporal correlation in 
natural video sequences. The hybrid DPCM/DCT coding removes temporal redundancy 
using inter-frame motion compensation. The residual error images are further processed 
by block Discrete Cosine Transform (DCT), which reduces spatial redundancy by de- 
correlating the pixels within a block and concentrating the energy of the block itself into 
a few low order coefficients. The DCT coefficients are then quantized according to a fixed 
quantization matrix that is scaled by a Scalar Quantization factor (SQ). Finally, Variable 
Length Coding (VLC) achieves high encoding efficiency and produces a bitstrecun. which 
is transmitted over ISDN (digital) or PSTN (analogue) channels, at constant bitrates. 
Due to the intrinsic structure of H.263, the final bitstream is produced at variable bitrate, 
hence it heis to be transformed to constant bitrate by the insertion of an output buffer 
which acts as feedback controller. The buffer controller has to achieve a target bitrate 
with consistent visual quality, low delay and low complexity. It monitors the amount 
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of bits produced and dynamically adjusts the quantization parameters, according to its 
fullness status and to the image complexity. 

The H,263 coding standard defines the techniques to be used and the syntax of the 
bitstream. There are some degrees of freedom in the design of the encoder. The standard 
puts no constraints about important processing stages such as motion estimation, adaptive 
scalar quantization, and bit-rate control. 

As far as the motion estimation part is concerned, block-matching motion estimation 
algorithms are usually adopted to estimate the motion field between the current frame 
to be coded and the previous decoded frame. The objective of motion field estimation 
for typical hybrid coding schemes is to achieve high motion-compensation performance: 
however, the evaluation of a large number of candidate vectors for each block can create 
a huge burden. To save computational effort, a clever search strateg\^ can prevent that 
all possible vectors need to be checked. 

In order to estimate the motion field related to the sequence to be coded, it is possible to 
use the 3-Dimensional Recursive Search block matching algorithm, presented in [5] and 
[6]. Unlike the more expensive full-search block matchers that estimate all the possible 
displacements within a search area, this algorithm only investigates a very limited number 
of possible displacements. By carefully choosing the candidate vectors, a high performance 
can be achieved, approaching almost true motion, with a low complexity design. 
The 3D-RS algorithm stimulates coherency of the vector field by employing recursion. 
However, in H.263 video coding context, the extremely smooth estimated motion field 
impairs the efficiency of the resulting displacement-compensated image prediction. Thus, 
a compromise must be found between minimizing the entropy of the displacement vectors 
and minimizing the displaced frame difference between temporally adjacent frames. 
For this purpose, we propose the design of the Enhanced 3D-RS motion es- 
timation algorithm which significantly improves the performance in terms of 
coding efficiency and leads to very good perceptual quality of the coded pic- 
tures, while keeping reasonably low the increase of the computational load. 
The organization of this document is as follows: Section 2 briefly summarizes the motion 
estimation part of the video codec, and the design of the 3D-RS motion estimation al- 
gorithm is introduced. In Section 3 we describe the architecture of the proposed Enhanced 
3D-RS algorithm. In Section 4 the performance in terms of coding efficiency cxnd source 
distortion is reported. Finally, in Section 5 the conclusions are drawn. 
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2 Encoding strategy: motion estimation techniques 

Motion estimation is part of the inter coding principle. Macroblocks of the current frame 
are matched to the frame previously coded. In other words, for a specific position, possibly 
on slightly translated coordinates in the previous frame the best match is found. The 
underlying necessary translation giving this best match is referred to as the displacement 
vector. The difference image between the current block and the translated block in the 
previous frame is referred to as the motion compensated signal. This signal is forwarded 
to the coding part, in combination with the displacement vector. 

2.1 Basic concepts 

In block-matching motion estimation algorithms, a displacement vector.^or motion vector 
d{bcJ). is assigned to the centre be = (j:^c:J/c)"' of a block of pixels B{bc) in the current 
image I{x,t), where tr means transpose. The assignment is done if B{bc) matches a 
similar block within a search area SA{bc), also centred at b^, but in the previous image 
I{xA - T). The similar block has a centre which is shifted with respect to be over the 
motion vector d[bcJ)^ To find d{bcA)., a number of candidate vectors C are evaluated 
applying an error measure e(C, b^ t) to quantify block similarity. Figure 1 illustrates the 
procedure- 




Picture number 
Figure 1: Illustration of block-matching. 

The pixels in the block B{bc) have the following positions: 

(:r. - A72 < X < Xc + A72) 
(l/c- V/2<i/<7/c-f r/2) 

with X and V the block width and block height respectively, and :r = {x.yY'' the spatial 
position in the image. 

Although the cost function itself can be rather straightforward and simple to implement, 
the high repetition factor for this calculation creates a huge burden. This occurs if many 
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candidate vectors are evaluated, i.e. if large search areas are considered. To save compu- 
tational effort in block-matching motion estimation algorithms, a clever search strategy 
has to be designed, preventing that all possible vectors need to be checked. 

2-2 3— Dimensional Recursive Search 

The high quality 3-Dimensional Recursive Search block matching algorithm, presented in 
[5] and [6], only investigates a very limited number of possible displacements. By carefully 
choosing the candidate vectors, a high performance can be achieved, approaching almost 
true motion, with a low complexity design. Its attractiveness was earlier proven in an IC 
for SD— TV consumer applications [7). 

The 3D-RS algorithm stimulates smoothness of the vector field by employing recursion. 
In this case the motion field d{t) is given by 



d(^) B d{bcA) = {C eCS{b,,t)\e{C,bcJ) < e(Z,6c,0)} V(Z G CS{b,,t)), 
{d{b, 

CS(bc. t) = \{d{b. 




(1) 



(d(6c - 
{d{bc - 

^'ix,y = random{—ai, . . . , ai), J/2x,y = random{—Q2s , . . , ao) 

where rando7n{-'a, , . . , a) denotes a random choice from the range [—a, a]. Figure 2 shows 
the block diagram of the 3D-RS algorithm. 



e(C(bj)) 



d(bj) 



motion field 
memory 



d(t) 



I 



3DRS motion estimator 



: predictor vectors 



Figure 2: 3D-RS algorithm' block diagram. 
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The candidate set CS{bc,t) consists of 5 vectors: three predictor vectors from a spatio- 
temporal neighborhood, and two vectors obtained by adding a random update vector to 
the motion vector estimated for the previous block. This implicitly assumes spatial aud/or 
temporal consistency. Figure 3 shows where the spatial and spatio-temporal prediction 
vectors are located relative to the current block. In [8] a half-pixel accuracy 3-D Recursive 
Search block-matcher is proposed, where [— ai,ai] = [-1, 1] and [— a2,ao] ~ [-6,6]. 
The 3D-RS algorithm leads to extremely smooth vectors fields. This fact reflects its 
improved coherency strategy (recursive search with spatial and temporal caudidates). 
However, low bitrate H.263 video coding leads to quite poor video quality: moreover, 
dealing with GIF or Q GIF formats, the number of 16 x 16 blocks is relatively small and 
that causes a slower convergence of the algorithm; these constraints seem to be too strong 
under certain circumstances, and that makes fall the motion estimator in local minimum 
errors. 




Block, in curreni fieJd 



Current block 



Block in previous field 



x-X X x+X 

Figure 3: Positions of the prediction vectors relative to the current block. 



^ 3 The Enhanced 3D-RS motion estimation algorithm 

If we take into account its low computational load (only five displacements are checked), 
the 3D-RS algorithm is an eflRcient motion estimator; in [10] is shown a comparison with 
the full-search motion estimator; for good quality images in the range of 32 to 37 dB 
PSNR, the average P-frame bitrate increases with only some 5% to achieve the same 
PSNR. However, in H.263 video coding context the performance of the 3D-RS algorithm 
is less satisfactory. 

Generally speaking, a key for better compression performance is to increase the accuracy 
of the motion estimator; for example, it is common to use half-pixel accuracy motion 
estimation in MPEG or H.263 video coding. Instead, in this section we will refer to integer 
pixel accuracy (without loss of generality) and describe the architecture of the proposed 
Enhanced 3D-RS algorithm, which improves the performance in terms of coding efficiency 
while keeping reasonably low the increase of the computational load. 

5 
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As mentioned in Section 2.2, the recursive strategy of the 3D-RS algorithm stimulates 
the smoothness of the motion field; this is an advantage because the more the smoothness 
is, the less the bits spent for motion data are (due to the entropy encoding of the motion 
information). However, the strong recursion of the 3D-IIS algorithm may lead to local 
minimum errors, which impairs the efficiency of the resulting displacement-compensated 
image prediction. 

Anyway, we can regard the 3D-RS algorithm as a very efficient coarse motion estimator, 
whose estimated motion field needs refining. 

In [3],[4] it has been shown that one can exploit the <empo7-aZ recursion of the algorithm by 
iterating the estimation process several times, in the way that the motion field calculated 
in the previous iteration is used zs temporal candidate field in the current iteration. 
Therefore, the refinement on the previous estimation is performed after a vector field. 
A better solution would be to exploit the spatial recursion of the algorithm; if a 
one— pixel search window refinement around each motion vector at macroblock 
level is performed, the correction on the currently estimated motion vector is 
immediately forwarded to the estimation of the next displacement vector. 
This solution is shown in Figure 4; the refinement block, inserted into the recursive loop 
of the estimator, enhances the convergence and speeds up the recursion of the algorithm. 
On formulas, the motion field d{f) 3 d{bc, t) = c?(6c, t) is found as^ 

d^{be,t) = {C e CS'{bc,t)\e{C..bc,t) < eiV.bc^t))} 



yiVeCS'{b,A.)). s = 1,2 



{^-{bc 



-A' 

Y 



(d^il - 
{d^ibc - 



0 



1,0, 



t-T)) 



0, 



(d"^(6,- ('^ ],t) + U 



(2) 



7^. 



7^ = 



CSHL t) = {C\C = + R), R,,y = {0, +1. -1} 

Equation (2) also shows that a different updating strategy suitable for enhanced estim- 
ation can be adopted. No random updates are added to the spatial predictor; this can 



'concerning the representation of the motion vectors d(bc.t) and the candidate sets CS{bc.t). super- 
script 5 refers to the step s = 1, . . . , A' in the computation of the motion field d(t) 9 cT{bc, t) = tP' (bc.t). 
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be explained by the fact that the refinement process improves the accuracy of the motion 
estimate; therefore, the displacement vector calculated for the previous block is supposed 
to be a more reliable predictor to ensure convergence to accurate motion field. In order 
to enable quick convergence, the update vector U is achieved multiplying JZ by the up- 
dating step Q, where TZ is the refinement term related to the previously computed motion 
vector; in this way the updating process adapts to the local minimum direction. Exper- 
imental results proved that the proposed updating strategy leads tq some performance 
improvement with respect to random strategy. 



\U2 





d'^b.r) 


integer pixel ' 
refinemenx 


j 

d^b.t) \ mntion field 




i memorv 

{ 



■ dit) 



.' predictor vrctnrt 



; enhanced 30 RS 

Figure 4: Enhanced 3-Dimensional Recursive search block diagram. 



The total number of candidate vectors is 13. Note that, unlike iterative estimation, no 
additional delay is introduced; indeed, the displacement vectors are immediately available 
after processing each block. 



4 Additional remarks 

This invention is not concerned with determining sets of candidate vectors. 
In fact I did a lot of experiments about that, and I realized that, as far as H.263 video coding 
is concerned, one can hardly achieve better performance by simply modifying the set of 
candidate vectors of the 3D-RS algorithm. 

Therefore, I am not proposing to use a particular set of candidate vectors; without loss of 
generality, I chose that set of candidate vectors according to the version of the 3D-RS 
algorithm that has been implemented in the IMcIC chip. But I could have chosen any other 
set of any other version of the 3D-RS algorithm that can be found in the literature; I could 
also have chosen the candidate vectors of spatially neighboring blocks as candidate vectors, 
according to EP 0,4 1 5,49 1 . 

In my opinion, and for what I know, one of the important aspects of this invention is as 
follows: since the 3D-RS algorithm for H.263 video coding provides quite inaccurate 
motion vectors, the purpose of the enhancement module is to improve the accuracy of the 
estimated motion field; thus, the enhancement can be regarded as a post processor of 
motion vectors. As the enhancement is a post-processing module, it is not involved in 
determining the set of candidate vectors, as it processes the motion vector associated with 
the present block, once the best displacement has been selected out of the candidate 
vectors. 
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Usually, post-processing is done once the whole motion filed has been computed. 

The new aspect of the enhancement of this invention is that better results can be achieved 
by doing post processing inside the recursion loop of the motion estimation algorithm, 
provided that the motion field is computed by a recursive motion estimation algorithm. 
This means that once the motion vector VO has been selected out of the candidate vectors, 
said motion vector VO is refined to produce the motion vector VI, in that if the frame 
difference corresponding to VI is smaller than the frame difference associated with VO, VI 
immediately replaces VO before the new set of candidate vectors for the next block is 
generated (see attached claims for more details). 

Note that this technique may be used for any recursive motion estimation algorithm; it can 
be also used for the motion estimation algorithm described in US 4,853,775, as depicted in 
the attached block diagram, which shows an inventive improvement over Figure 13 of US 



The post processing of a preferred embodiment of this invention includes an integer pixel 
refinement around the motion vector that has been selected by the motion estimator; 
however, any refinement technique able to achieve more accurate motion field may be 
used. Very good results can be obtained if within the recursion loop, the integer pixel 
refinement is followed by a half pixel refinement; this solution results, however, in a 
relatively large computational load, so that the integer pixel refinement is preferred. 

Anyway, in my opinion, and for what I know, the new and important aspect is that, 
provided that post processing is done inside the recursion loop of any recursive motion 
estimation algorithm instead of outside the recursion loop, the convergence of the recursive 
motion estimation algorithm is speeded up. 

Another important aspect of a preferred embodiment of this invention is that the difference 
between the output and the input of the enhancement module gives a local information on 
the trend of the motion (see Equation 2); this information can be exploited to determine an 
additional candidate vector which contributes to further improve the performance of the 
algorithm (see also attached block diagram). This can be regarded as an optional feature of 
the proposed scheme, meaning that the enhancement provides itself conspicuous 
performance gain, even if the above defined additional candidate vector is not added to the 
original candidate set, 

I would like to remark that both simulation results and subjective tests have confirmed the 
effectiveness of the enhancement; a lot of time have been spent finding an efficient 
technique to improve the performance of the 3D-RS algorithm, which provides poor rate- 
distortion performance when used for H.263 video coding purpose. Although many 
optimization steps were done to tune various parameters of the 3D-RS algorithm and 
various techniques to refine the motion field estimated by the 3D-RS algorithm were 
evaluated, only the adoption of the enhancement scheme described above resulted in 
considerable performance gain. 
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5 Conclusion 

We have presented the design of the Enhanced 3D-RS motion estimation algorithm for 
H.263 video coding application, which provides satisfactory performance in terms of cod- 
ing eflBciency wh,ile keeping low the computational load. We have shown that the En- 
hanced 3D-RS algorithm outperforms the improved iterative 3D-RS strategy, and we 
have found that in case of typical videoconferencing sequences it is very close to full-search 
motion estimation. Furthermore, we have seen that the recursive estimation strategy stim- 
ulates better consistency of the motion field and leads to improved noise robustness of the 
motion estimation process, in that the Enhanced 3D-RS is comparable with full-search 
in case of noisy sequences. 

The Enhanced 3D-RS algorithm has been successfully integrated with the H.263 video 
codec for the PhiUps Trimedia processor (TMIOOO). It has been seen that this morion 
estimation algorithm leads to significant computational saving, and real-time experiments 
have proved very good perceptual quality of the coded sequences. 
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6 Claims " ^'^ 

1. A method and an apparatus of improving the accuracy of the motion field estim- 
ated by a motion estimation algorithm, which allows improved convergence of the 
motion estimation algorithm with respect to conventional methods, provided that 
the motion field is estimated by a recursive motion estimation algorithm, recursive 
meaning that the motion estimation algorithm computes the motion vector cissoci- 
ated with a picture portion (for example a block) by exploiting motion iufornuition 
already determined for previous blocks. 

2. A method and an apparatus as claimed in Claim 1 which genercites eight motion 
vectors from the motion vector i'o that has been selected out of the corresponding 
candidate vector set. This apparatus is called enhancement module. 

3. A method and an apparatus, according to Claim 2, characterized in that each of 
the eight vector is achieved by adding ±1 pixel displacement to each component of 
the motion vector vq that has been selected out of the candidate vector set. The 
motion vector t^i out of said eight motion vectors with the smallest frame difference 
is selected: if the frame difference of said motion vector Vi is smaller than the frame 
difference of the motion vector ^^o. the motion vector Vi immediately replaces the 
motion vector vq before the new set of candidate vectors for the motion vector 
associated with the next block is generated. 



A method and an apparatus as claimed in Claim 1, which is able to provide a local 
information on the trend of the motion by computing an update vector given by the 
difference between output and input of the enhancement. This update vector can 
be used to generate one more candidate vector by adding the update vector to one 
of the vectors of the candidate vector set. 

A method and an apparatus as claimed in any one of the preceding Claims, which 
speeds up the convergence of the 3D-RS motion estimation algorithm to more ac- 
curate motion field. 

A method and an apparatus as claimed in any one of the preceding Claims, which 
allows substantial rate-distortion performance gain when applied to H.263 video 
coding, as well as improved subjective quality. 
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